[fix][ml] Prevent terminated managed ledger from transitioning back to writable state by void-ptr974 · Pull Request #25795 · apache/pulsar

void-ptr974 · 2026-05-16T04:59:30Z

Related issue

Motivation

ManagedLedger.terminate() seals the managed ledger at the current BookKeeper committed boundary. After termination, no new entries should be accepted, and the managed ledger must not
become writable again.

The key invariant is:

Any add operation that is acknowledged successfully to the caller must have a position less than or equal to the final terminatedPosition.

terminate() does not wait for every submitted add to succeed. It closes the current BookKeeper ledger and uses the ledger's final LAC as the terminatedPosition.

Therefore:

adds already acknowledged by BookKeeper and included in LAC may still complete successfully, even if the ManagedLedger callback is delivered after terminate();
adds that have not reached LAC, including queued writes waiting for a future ledger, must fail with ManagedLedgerTerminatedException;
termination must not create or switch to another ledger to replay pending writes.

There is a race between terminate() and ledger rollover:

An add fills the current ledger and triggers rollover.
The managed ledger moves into ClosingLedger / CreatingLedger.
terminate() runs before the rollover create/switch callback finishes and marks the managed ledger as Terminated.
The delayed createComplete() or updateLedgersIdsComplete() callback resumes the old rollover flow.
The callback can set the state back to LedgerOpened, making a terminated managed ledger writable again.

This breaks the terminate semantics. It can also leave pending writes handled as normal rollover writes instead of terminated writes, or leave in-flight add callbacks hanging when
BookKeeper close drains writes that were not included in the final LAC.

Modifications

This change keeps Terminated as the final write state after terminate() takes ownership.

The implementation follows one invariant:

Any add that completes successfully to the caller must be included in the final terminatedPosition.

To keep that invariant, this PR handles each race path explicitly:

Queued adds that have not been sent to BookKeeper
- When terminate() takes ownership, these adds are failed with ManagedLedgerTerminatedException.
- They are only waiting for a future ledger, and terminate must not create or switch to another ledger to replay them.
In-flight adds already sent to BookKeeper
- These adds are not failed immediately.
- BookKeeper decides whether they are inside the final LAC:
  - if BookKeeper has acked the add and advanced LAC, the add may still complete successfully;
  - if BookKeeper close drains the add before it reaches LAC, the add is failed as terminated.
- This preserves the rule that successful add positions are covered by terminatedPosition.
Failed add callbacks after terminate
- Added failAddIfTerminated() for failed add callbacks that arrive after terminate has taken ownership.
- Without this, a drained add would enter the normal write-failure path. In Terminated state, ledgerClosed() returns without replaying or failing the add, leaving the client callback
  hanging.
Late ledger create callbacks
- createComplete() now completes the ledger-create future before terminal-state checks.
- This keeps the timeout checker and pending create-op metric balanced even if the create callback arrives after terminate.
- If the late callback created a ledger after termination, the ledger is closed and deleted because terminated ledgers cannot use a future ledger for pending writes.
Late ledger switch callbacks
- updateLedgersIdsComplete() now returns when the managed ledger is already Terminated.
- This prevents a delayed rollover callback from setting the state back to LedgerOpened.
Replacement ledger creation after close
- The close path no longer creates a replacement ledger when the state is already Terminated.
- Terminate closes the current ledger to seal the committed prefix; it must not continue the rollover flow by creating another writable ledger.

This does not change the public API, protocol, schema, or metric names. It only fixes the behavior of existing state transitions and balances the existing pending ledger-create metric in
the late-callback path.

Verifying this change

Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

terminateDuringLedgerSwitchKeepsTerminatedState
- Simulates a delayed BookKeeper ledger create callback returning after terminate().
- Verifies the managed ledger stays Terminated.
- Verifies queued pending writes fail with ManagedLedgerTerminatedException.
- Verifies the late-created ledger is deleted.
- Verifies the ledger-create future is completed and the pending create-op metric is balanced.
- Verifies reopening the managed ledger still observes the terminated state.
terminatePositionIncludesAddAlreadyAckedByBookKeeper
- Simulates BookKeeper already acking an add and advancing LAC, while the ManagedLedger client callback is delayed.
- Verifies terminate() returns a terminatedPosition that includes the acknowledged add.
- Verifies the delayed client callback can still complete successfully with a position inside the terminated range.
terminateFailsInflightAddDrainedByLedgerClose
- Simulates BookKeeper close draining an outstanding add that has not reached LAC.
- Verifies the add fails with ManagedLedgerTerminatedException.
- Verifies the pending add queue is cleared.
- Verifies the callback does not hang.
ledgerSwitchCompletionDoesNotReopenTerminatedLedger
- Verifies a late ledger-switch completion cannot move the state from Terminated back to LedgerOpened.

Local verification:

./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.ManagedLedgerTerminationTest
./gradlew :managed-ledger:checkstyleMain :managed-ledger:checkstyleTest

Does this pull request potentially affect one of the following parts:

When terminate() races with ledger rollover, a delayed rollover completion can overwrite the Terminated state and move the ManagedLedger back to LedgerOpened. The concrete race is: an add fills the current ledger and starts creating the next ledger, terminate() marks the ManagedLedger as Terminated, then the delayed createComplete/updateLedgersIdsComplete callback resumes the old rollover path and reopens the ledger for writes. This commit makes the rollover completion path respect Terminated as a final write state. Late createComplete callbacks close the newly created ledger handle and fail queued adds with ManagedLedgerTerminatedException. Late updateLedgersIdsComplete callbacks return without setting LedgerOpened, and the close-ledger path no longer creates a replacement ledger after termination.

Denovo1998 · 2026-05-26T13:30:32Z

                            State state = STATE_UPDATER.get(ManagedLedgerImpl.this);
-                            if (state == State.Closed || state.isFenced()) {
-                                log.debug().log("skip ledger update after create complete ledger is closed or fenced");
+                            if (state == State.Closed || state == State.Terminated || state.isFenced()) {


I think this branch still leaves one terminate-vs-rollover race unresolved. By the time this callback reaches operationComplete, updateLedgersListAfterRollover has already successfully written metadata that includes newLedger. If terminate() is concurrently writing the terminated metadata, the two store.asyncUpdateLedgerIds(...) calls can still race because asyncTerminate is not serialized with this metadataMutex path.

There are two problematic outcomes: if the rollover metadata update wins first, this branch only closes lh and relies on a later terminate metadata update to remove the new ledger from metadata, but the unused BookKeeper ledger is not deleted; if terminate wins the metadata version race first, this rollover callback can go through operationFailed(BadVersionException) and call handleBadVersion, fencing a ledger that should remain terminated. The TODO here is therefore part of the correctness fix, not just cleanup.

Can we make the in-flight rollover metadata update terminal-state aware as well, e.g. serialize terminate with the same metadata update path, or handle state == Terminated in both the success and BadVersion failure callbacks as a stale rollover completion, while ensuring the unused ledger is removed from metadata and deleted if it was already written?

Thanks for the detailed explanation. I’ve also identified this remaining
terminate-vs-rollover metadata race.

The ManagedLedger code path is quite complex, so I think it would be clearer
and safer to split the related fixes into smaller PRs instead of carrying all
of them in this one. This PR focuses on the terminated-state transition and
pending-add handling.

After this PR is merged, I’ll submit a follow-up PR to address the metadata
update race, including the stale rollover success path, the BadVersion path,
and cleanup of any unused ledger that may have been written to metadata.

Unresolving this comment since there's very useful observations and analysis so far and it's useful to keep the thread visible until we agree on how to address the problems in short term and long term.

I agree that the current ManagedLedgerImpl code path is quite complex. It's hard to reason about the correctness since it's a mixture of locks, synchronization and different executors.

Just wondering if terminate should be implemented in a way where the state is set to "Terminating" which would be used to prevent any new operations starting, but existing inflight operations would first be completed before the ManagedLedgerImpl could be transitioned into "Terminated" state? The transitioning from Terminating to Terminated would be handled after the metadata update has been successfully completed. If there's a crash in the metadata update, the ledger will continue to be operational. The client requesting the termination can retry if it doesn't receive a successful response to the termination request.
WDYT?

Thanks for the detailed review. I agree the terminate-vs-rollover metadata race is important.

For this PR, I would like to keep the scope focused on the immediate terminated-state and pending-add behavior: once termination has taken effect, late callbacks should not move the managed ledger back to a writable state, and pending add callbacks should complete consistently.

The metadata race is a different path and the fix is more involved. A focused follow-up PR would need to handle the terminate metadata update together with the metadataMutex-serialized metadata update path, including:

stale rollover metadata success after termination, where we may need metadata cleanup and unused BookKeeper ledger cleanup;

stale rollover metadata failure after termination wins, where BadVersionException should not fence a ledger that is already terminating/terminated.

Mixing that into this PR would make the review much harder because it combines state/callback behavior with metadata serialization and ledger cleanup. My preference is to keep this PR focused, then send a separate PR for the metadataMutex / stale rollover metadata path.

I agree that a separate Terminating state would be a cleaner long-term model.

The distinction I have in mind is:

Terminating: termination has started on this broker. New writes and new rollovers should be rejected, late callbacks should not move the ledger back to writable, and existing in-flight adds should either drain or fail deterministically.

Terminated: the terminate metadata update has succeeded and terminatedPosition is durable.

With this model, Terminating does not need to be the durable state. If the broker crashes before the metadata update succeeds, recovery can treat the ledger as non-terminated and the caller can retry termination. If the metadata update succeeds, recovery observes terminatedPosition and restores Terminated.

This would make the state machine easier to reason about, but it still needs a clear design for how it interacts with rollover callbacks, pending add entries, metadata updates, and BadVersionException handling. It also would not replace the need to fix the metadataMutex race; it mainly gives us a clearer lifecycle boundary so those cases can be handled consistently.

I think this is worth doing as a follow-up design/change.

For this PR, I would like to keep the scope focused on the immediate terminated-state and pending-add behavior: once termination has taken effect, late callbacks should not move the managed ledger back to a writable state, and pending add callbacks should complete consistently.

It just feels that this approach isn't correct. The problem itself is real.

I think this is worth doing as a follow-up design/change.

Yes, it most likely makes sense to start with the analysis and design. In Pulsar, it's most natural to document the problem, analysis and design into a PIP before implementation (prototypes and experimentation are obviously recommended and necessary when coming up with the design).

Thanks, that makes sense to me.

I’ll take a deeper look at the overall ManagedLedger termination flow and how these issues should be handled together, including the state transition model, the metadata update/locking path, and stale rollover callbacks.

After that, I’ll try to prepare a proper PIP or design proposal so we can discuss the problem, analysis, and proposed approach more clearly with the community.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

+            assertTrue(createRequested.await(5, TimeUnit.SECONDS));
+            assertEquals(ledger.getState(), ManagedLedgerImpl.State.CreatingLedger);
+
+            CountDownLatch addFailed = new CountDownLatch(1);


+                                // TODO: if this path is hit after the new ledger was already written into metadata,
+                                // delete the unused ledger together with removing it from the metadata.


void-ptr974 added 2 commits May 16, 2026 12:55

Handle adds during ManagedLedger termination

d258aa6

void-ptr974 mentioned this pull request May 23, 2026

[Bug][ML] ManagedLedger.terminate() can race with ledger rollover and make a terminated ledger writable again #25858

Open

Denovo1998 suggested changes May 26, 2026

View reviewed changes

void-ptr974 requested a review from Denovo1998 May 27, 2026 02:15

lhotari requested review from Copilot and merlimat and removed request for Denovo1998 June 3, 2026 21:08

Copilot started reviewing on behalf of lhotari June 3, 2026 21:09 View session

Copilot AI reviewed Jun 3, 2026

lhotari requested a review from Copilot June 3, 2026 21:41

Copilot started reviewing on behalf of lhotari June 3, 2026 21:41 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix][ml] Prevent terminated managed ledger from transitioning back to writable state#25795

[fix][ml] Prevent terminated managed ledger from transitioning back to writable state#25795
void-ptr974 wants to merge 2 commits into
apache:masterfrom
void-ptr974:fix-managed-ledger-terminate-state-race

void-ptr974 commented May 16, 2026 •

edited

Loading

Uh oh!

Denovo1998 May 26, 2026

Uh oh!

void-ptr974 May 27, 2026

Uh oh!

lhotari Jun 3, 2026

Uh oh!

void-ptr974 Jun 4, 2026

Uh oh!

lhotari Jun 5, 2026

Uh oh!

void-ptr974 Jun 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		// TODO: if this path is hit after the new ledger was already written into metadata,
		// delete the unused ledger together with removing it from the metadata.

Uh oh!

Conversation

void-ptr974 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issue

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Uh oh!

Denovo1998 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

void-ptr974 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

lhotari Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

void-ptr974 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

lhotari Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

void-ptr974 Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

void-ptr974 commented May 16, 2026 •

edited

Loading